Free Solved[December 2023] BCS40 - Statistical Techniques Question Paper

Hey there! Welcome to KnowledgeKnot! Don't forget to share this with your friends and revisit often. Your support motivates us to create more content in the future. Thanks for being awesome!

Question 1. (a) An electric bulb manufacturing company chooses a random sample of 10 bulbs, received from one of the suppliers. It determines life of each bulb. The result (in thousands of hours) are as follows: 3, 4.5, 5.0, 4.2, 4.8, 4.2, 5.1, 4.0, 4.2, 4.2, 4.5. Compute and analyse a point estimate of the mean length of the life of the bulbs received from the supplier.
(b) Compare parametric and non-parametric tests. (2 Marks)

Solution:

(a) To compute the point estimate of the mean length of the life of the bulbs received from the supplier, we first calculate the sample mean.

Given data: 3, 4.5, 5.0, 4.2, 4.8, 4.2, 5.1, 4.0, 4.2, 4.2, 4.5

To find the sample mean:

$\text{Sample Mean} = \frac{3 + 4.5 + 5.0 + 4.2 + 4.8 + 4.2 + 5.1 + 4.0 + 4.2 + 4.2 + 4.5}{10}$

$\text{Sample Mean} = \frac{48.7}{10} = 4.87 \text{ thousand hours}$

Therefore, the point estimate of the mean length of the life of the bulbs received from the supplier is $4.87$ thousand hours.

The analysis of this estimate could include discussing the variability of the data, potential biases in the sample, and the reliability of using this estimate to make inferences about the entire population of bulbs received from the supplier.

(b) Parametric tests and non-parametric tests are two broad categories of statistical tests used for hypothesis testing.

Parametric tests assume that the data being analyzed follow a specific probability distribution, often the normal distribution. Examples of parametric tests include t-tests, ANOVA, and linear regression. These tests typically require certain assumptions about the data distribution and variance.

Non-parametric tests, on the other hand, make fewer assumptions about the data distribution. They are used when the data do not meet the assumptions of parametric tests or when the data are ordinal or categorical. Examples of non-parametric tests include the Wilcoxon signed-rank test, Mann-Whitney U test, and Kruskal-Wallis test.

In summary, parametric tests are more powerful when their assumptions are met, but non-parametric tests are more robust and can be used in a wider range of situations where parametric assumptions are violated or when dealing with non-normally distributed data.

Question 2. An insurance company has insured 1000 truck drivers, 3000 car drivers and 6000 scooter drivers. The probabilities that the truck, car and scooter drivers meet with an accident are $0.2$ , $0.04$ and $0.25$ , respectively. One of the insured persons meets with an accident. What is the probability that the person is a car driver? (5 Marks)

Solution:

To find the probability that the person involved in the accident is a car driver, we can use Bayes' theorem.
Let $A$ represent the event that the person involved in the accident is a car driver.
Let $B$ represent the event that the person involved in the accident is any type of driver (truck, car, or scooter).
We want to find $P(A | B)$ , the probability that the person involved in the accident is a car driver given that they are any type of driver.
According to Bayes' theorem:
$P(A | B) = \frac{P(B | A) \cdot P(A)}{P(B)}$
Given:
- $P(A)$ = Probability that the person involved in the accident is a car driver = $\frac{3000}{1000 + 3000 + 6000} = \frac{3000}{10000}$
- $P(B | A)$ = Probability that the person involved in the accident is any type of driver given that they are a car driver = $0.04$ (as provided in the question)
- $P(B)$ = Probability that the person involved in the accident is any type of driver Now, let's calculate $P(B)$ :
$P(B) = \frac{1000 \times 0.2 + 3000 \times 0.04 + 6000 \times 0.25}{10000}$
$P(B) = \frac{200 + 120 + 1500}{10000}$
$P(B) = \frac{1820}{10000}$
$P(B) = 0.182$
Now, substitute the values into Bayes' theorem:
$P(A | B) = \frac{0.04 \times \frac{3000}{10000}}{0.182}$
$P(A | B) = \frac{0.04 \times 0.3}{0.182}$
$P(A | B) = \frac{0.012}{0.182}$
$P(A | B) ≈ \frac{0.012}{0.182} ≈ 0.0659$
Therefore, the probability that the person involved in the accident is a car driver is approximately $0.0659$ or $6.59\%$ .

Question 3. A football manufacturing company wants to check the variation in the weight of balls. For this, 25 samples (each of size 4) are selected. The weight of each ball is measured (in grams), the sum of sample averages and sum of sample ranges were found to be $\sum_{i=1}^{25} \bar{x}_i = 4010$ grams and $\sum_{i=1}^{25} R_i = 72$ grams, respectively. Compute the control limits for the $\bar{X}$ and R-chart. It is given that $A2 = 0.729$ , $D3 = 0$ and $D4 = 2.282$ (5 Marks).

Solution:

Given:

Number of samples ( $k$ ) = 25
Sample size ( $n$ ) = 4
Sum of sample averages ( $\sum_{i=1}^{25} \bar{x}_i$ ) = 4010 grams
Sum of sample ranges ( $\sum_{i=1}^{25} R_i$ ) = 72 grams
Constants: $A2 = 0.729$ , $D3 = 0$ , $D4 = 2.282$

Step-by-Step Solution:

1. Calculate the average of sample means ( $\bar{\bar{x}}$ ):
$\bar{\bar{x}} = \frac{\sum_{i=1}^{25} \bar{x}_i}{25} = \frac{4010}{25} = 160.4 \text{ grams}$

2. Calculate the average range ( $\bar{R}$ ):
$\bar{R} = \frac{\sum_{i=1}^{25} R_i}{25} = \frac{72}{25} = 2.88 \text{ grams}$

3. Control Limits for $\bar{X}$ -Chart:
The control limits for the $\bar{X}$ -chart are calculated as follows:

Upper Control Limit (UCL):
$\text{UCL}_{\bar{X}} = \bar{\bar{x}} + A2 \cdot \bar{R}$
$\text{UCL}_{\bar{X}} = 160.4 + 0.729 \cdot 2.88 = 160.4 + 2.10032 = 162.50032 \text{ grams}$

Center Line (CL):
$\text{CL}_{\bar{X}} = \bar{\bar{x}} = 160.4 \text{ grams}$

Lower Control Limit (LCL):
$\text{LCL}_{\bar{X}} = \bar{\bar{x}} - A2 \cdot \bar{R}$
$\text{LCL}_{\bar{X}} = 160.4 - 0.729 \cdot 2.88 = 160.4 - 2.10032 = 158.29968 \text{ grams}$

4. Control Limits for R-Chart:
The control limits for the R-chart are calculated as follows:

Upper Control Limit (UCL):
$\text{UCL}_R = D4 \cdot \bar{R}$
$\text{UCL}_R = 2.282 \cdot 2.88 = 6.57136 \text{ grams}$

Center Line (CL):
$\text{CL}_R = \bar{R} = 2.88 \text{ grams}$

Lower Control Limit (LCL):
$\text{LCL}_R = D3 \cdot \bar{R}$
$\text{LCL}_R = 0 \cdot 2.88 = 0 \text{ grams}$

Question 4. The frequency distribution of the accidental data of the factory for the last 50 weeks is shown below: No. of Accidents No. of Weeks is shown below:
No. of Accidents No. of Weeks
0 - 5 8
5 - 10 22
10 - 15 10
15 - 20 8
20 - 25 2
Draw the histogram and calculate the average number of accidents per week. (5 Marks)

Solution:

The frequency distribution of the accidental data of the factory for the last 50 weeks is as follows:

No. of Accidents	No. of Weeks
0 - 5	8
5 - 10	22
10 - 15	10
15 - 20	8
20 - 25	2

No. of Accidents	Midpoint (m)	No. of Weeks (f)	*m f**
0 - 5	2.5	8	20
5 - 10	7.5	22	165
10 - 15	12.5	10	125
15 - 20	17.5	8	140
20 - 25	22.5	2	45

Total of m * f = 20 + 165 + 125 + 140 + 45 = 495

Average number of accidents per week:

$\text{Average} = \frac{\text{Total of } m \times f}{\text{Total number of weeks}} = \frac{495}{50} = 9.9$

So, the average number of accidents per week is 9.9.

Question 5. In order to test whether there is any significant difference between the proportion of safety consciousness of men and women, while driving a car, a study was conducted. The study includes a sample of 300 men and 300 women. Out of 300 men, 130 said that they used seat belts, and out of 300 women, 90 said that they used seat belts. Based on the given data, test the claim that there is no significant difference between the proportion of safety consciousness of men and women, while driving a car at 5% level of significance. (Given that $Z_{0.025} = 1.96$ ).

Solution:

Step 1: Formulate the hypotheses.
Null hypothesis: $H_0: p_1 = p_2$ (There is no significant difference between the proportions)
Alternative hypothesis: $H_a: p_1 \neq p_2$ (There is a significant difference between the proportions)

Step 2: Calculate the sample proportions.
$p_1 = \frac{130}{300} = 0.4333$
$p_2 = \frac{90}{300} = 0.3$

Step 3: Calculate the pooled proportion.
$p = \frac{130 + 90}{300 + 300} = \frac{220}{600} = 0.3667$

Step 4: Calculate the standard error.
$SE = \sqrt{p(1 - p)\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}$
$SE = \sqrt{0.3667 \times 0.6333 \left(\frac{1}{300} + \frac{1}{300}\right)}$
$SE = \sqrt{0.3667 \times 0.6333 \times 0.0067}$
$SE = \sqrt{0.001556}$
$SE = 0.0394$

Step 5: Calculate the z-statistic.
$z = \frac{p_1 - p_2}{SE}$
$z = \frac{0.4333 - 0.3}{0.0394}$
$z = \frac{0.1333}{0.0394}$
$z = 3.38$

Step 6: Compare the z-statistic to the critical value.
The critical value at 5% level of significance for a two-tailed test is $Z_{0.025} = 1.96$ .
Since $|3.38| > 1.96$ , we reject the null hypothesis.

Conclusion:
There is a significant difference between the proportion of safety consciousness of men and women while driving a car at the 5% level of significance.

Question 6. A company manufactures two types of machines (A and B). The manager of the company tests a random sample of 50 machines of Type A and 60 machines of Type B and found the following information:
Mean Life (in hours) Standard Deviation(in hourse)
Type A 1300 50
Type B 1200 60

Obtain 99% confidence interval for the difference of the average life of the two types of machines. (Given that $Z_{0.005} = 2.58$ ).

Solution:

Step 1: Identify the given data.
Sample size for Type A, $n_1 = 50$
Sample size for Type B, $n_2 = 60$
Mean life for Type A, $\overline{X}_1 = 1300$
Mean life for Type B, $\overline{X}_2 = 1200$
Standard deviation for Type A, $S_1 = 50$
Standard deviation for Type B, $S_2 = 60$

Step 2: Calculate the standard error of the difference between means.
$SE = \sqrt{\frac{S_1^2}{n_1} + \frac{S_2^2}{n_2}}$
$SE = \sqrt{\frac{50^2}{50} + \frac{60^2}{60}}$
$SE = \sqrt{\frac{2500}{50} + \frac{3600}{60}}$
$SE = \sqrt{50 + 60}$
$SE = \sqrt{110}$
$SE = 10.488$

Step 3: Calculate the difference between the sample means.
$(\overline{X}_1 - \overline{X}_2) = 1300 - 1200 = 100$

Step 4: Determine the margin of error using the z-value for 99% confidence interval.
$ME = Z_{0.005} \times SE$
$ME = 2.58 \times 10.488$
$ME = 27.458$

Step 5: Calculate the confidence interval.
$(\overline{X}_1 - \overline{X}_2) \pm ME$
$100 \pm 27.458$
$72.542 \text{ to } 127.458$

Conclusion:
The 99% confidence interval for the difference in the average life of the two types of machines is from 72.542 hours to 127.458 hours.

Question 7. To enforce the speed limit at four different locations in the city, the Police plans to install radar traps at each of the locations L1, L2, L3 and L4. The radar traps at each of the locations L1, L2, L3 and L4 are operated 40%, 30%, 20% and 30% of the time. If a person who is speeding on his way to work has probabilities of 0.2, 0.1, 0.5 and 0.2 respectively, of passing through these locations, what is the probability that he will receive a speeding ticket? Find also the probability that he will receive a speeding ticket at locations L1, L2, L3 and L4.

Solution:

Step 1: Identify the given data.
Probability of radar trap being operated at L1, $P(T_1) = 0.4$
Probability of radar trap being operated at L2, $P(T_2) = 0.3$
Probability of radar trap being operated at L3, $P(T_3) = 0.2$
Probability of radar trap being operated at L4, $P(T_4) = 0.3$
Probability of passing through L1, $P(L_1) = 0.2$
Probability of passing through L2, $P(L_2) = 0.1$
Probability of passing through L3, $P(L_3) = 0.5$
Probability of passing through L4, $P(L_4) = 0.2$

Step 2: Calculate the probability of receiving a speeding ticket at each location.
Probability of receiving a speeding ticket at L1, $P(S|L_1) = P(T_1) \times P(L_1) = 0.4 \times 0.2 = 0.08$
Probability of receiving a speeding ticket at L2, $P(S|L_2) = P(T_2) \times P(L_2) = 0.3 \times 0.1 = 0.03$
Probability of receiving a speeding ticket at L3, $P(S|L_3) = P(T_3) \times P(L_3) = 0.2 \times 0.5 = 0.10$
Probability of receiving a speeding ticket at L4, $P(S|L_4) = P(T_4) \times P(L_4) = 0.3 \times 0.2 = 0.06$

Step 3: Calculate the total probability of receiving a speeding ticket.
$P(S) = P(S|L_1) + P(S|L_2) + P(S|L_3) + P(S|L_4)$
$P(S) = 0.08 + 0.03 + 0.10 + 0.06$
$P(S) = 0.27$

Conclusion:
The probability that the person will receive a speeding ticket is 0.27.
The probabilities of receiving a speeding ticket at locations L1, L2, L3, and L4 are 0.08, 0.03, 0.10, and 0.06 respectively.

Question 8. Find and plot the regression line of y on x, for the data given below:
Speed(Km/hr) 30 40 50 60
Stopping Distance(in feet) 160 240 330 435

Solution:

Speed (X): 30, 40, 50, 60
Stopping Distance (Y): 160, 240, 330, 435

To find the regression line $Y = a + bX$ , we need to calculate the slope $b$ and the intercept $a$ .

Step 1: Calculate the means of X and Y.
$\overline{X} = \frac{\sum X}{n} = \frac{30 + 40 + 50 + 60}{4} = 45$
$\overline{Y} = \frac{\sum Y}{n} = \frac{160 + 240 + 330 + 435}{4} = 291.25$

Step 2: Calculate the slope $b$ .
$b = \frac{\sum (X - \overline{X})(Y - \overline{Y})}{\sum (X - \overline{X})^2}$

Calculate $\sum (X - \overline{X})(Y - \overline{Y})$ :
$\sum (X - \overline{X})(Y - \overline{Y}) = (30 - 45)(160 - 291.25) + (40 - 45)(240 - 291.25) + (50 - 45)(330 - 291.25) + (60 - 45)(435 - 291.25)$
$= (-15)(-131.25) + (-5)(-51.25) + (5)(38.75) + (15)(143.75)$
$= 1968.75 + 256.25 + 193.75 + 2156.25$
$= 4575$

Calculate $\sum (X - \overline{X})^2$ :
$\sum (X - \overline{X})^2 = (30 - 45)^2 + (40 - 45)^2 + (50 - 45)^2 + (60 - 45)^2$
$= (-15)^2 + (-5)^2 + (5)^2 + (15)^2$
$= 225 + 25 + 25 + 225$
$= 500$

Calculate the slope $b$ :
$b = \frac{4575}{500} = 9.15$

Step 3: Calculate the intercept $a$ .
$a = \overline{Y} - b\overline{X}$
$a = 291.25 - 9.15 \times 45$
$a = 291.25 - 411.75$
$a = -120.5$

Step 4: Form the regression equation.
$Y = a + bX$
$Y = -120.5 + 9.15X$

The regression line of y on x is $Y = -120.5 + 9.15X$ .

Step 5: Plot the regression line.

To plot the regression line, use the equation $Y = -120.5 + 9.15X$ to calculate Y for various values of X. Then, plot the points and draw the line through them.

Using the equation for a few points:
For $X = 30$ , $Y = -120.5 + 9.15 \times 30 = 154.5$
For $X = 40$ , $Y = -120.5 + 9.15 \times 40 = 246$
For $X = 50$ , $Y = -120.5 + 9.15 \times 50 = 337.5$
For $X = 60$ , $Y = -120.5 + 9.15 \times 60 = 429$

These points can be used to plot the regression line on a graph.

Question 9. A chemical firm wants to determine how four catalysts differ in yield? The firm runs the experiment in three of its plant, namely A, B & C. In each plant, the yield is measured with each catalyst. The yield (in quintals) are as follows:
Plant Catalyst
1 2 3 4
A 2 1 2 4
B 3 2 1 3
C 1 3 3 1
Perform an ANOVA and comment whether the yield due to a particular catalyst is significant or not at 5% level of significance (Given $F_{3,6} = 4.76$ ).

Plant	Catalyst
1	2	3	4
A	2	1	2	4
B	3	2	1	3
C	1	3	3	1

Solution:

Step 1: Calculate the means

Total mean:
$\overline{Y} = \frac{2 + 1 + 2 + 4 + 3 + 2 + 1 + 3 + 1 + 3 + 3 + 1}{12} = \frac{26}{12} = 2.17$

Means for each catalyst:

$\overline{Y_1} = \frac{2 + 3 + 1}{3} = 2$
$\overline{Y_2} = \frac{1 + 2 + 3}{3} = 2$
$\overline{Y_3} = \frac{2 + 1 + 3}{3} = 2$
$\overline{Y_4} = \frac{4 + 3 + 1}{3} = 2.67$

Step 2: Calculate the sum of squares between groups (SSB)

$SSB = \sum_{i=1}^{k} n_i (\overline{Y_i} - \overline{Y})^2$
$= 3(2 - 2.17)^2 + 3(2 - 2.17)^2 + 3(2 - 2.17)^2 + 3(2.67 - 2.17)^2$
$= 3(0.03) + 3(0.03) + 3(0.03) + 3(0.25)$
$= 0.09 + 0.09 + 0.09 + 0.75$
$= 1.02$

Step 3: Calculate the sum of squares within groups (SSW)

$SSW = \sum_{i=1}^{k} \sum_{j=1}^{n_i} (Y_{ij} - \overline{Y_i})^2$
$= (2-2)^2 + (1-2)^2 + (2-2)^2 + (4-2.67)^2 + (3-2.67)^2 + (1-2.67)^2 + (3-2)^2 + (2-2)^2 + (1-2)^2 + (1-2)^2 + (3-2)^2 + (3-2)^2$
$= 0 + 1 + 0 + 1.76 + 0.11 + 2.78 + 1 + 0 + 1 + 1 + 1 + 1$
$= 11.65$

Step 4: Calculate the mean squares

$MSB = \frac{SSB}{k-1} = \frac{1.02}{3} = 0.34$
$MSW = \frac{SSW}{N-k} = \frac{11.65}{8} = 1.46$

Step 5: Calculate the F-statistic

$F = \frac{MSB}{MSW} = \frac{0.34}{1.46} = 0.23$

Step 6: Compare the calculated F-statistic with the critical value

Since the calculated F-statistic $(0.23)$ is less than the critical value $(4.76)$ , we fail to reject the null hypothesis.

Conclusion: There is no significant difference in the yield due to the different catalysts at the 5% level of significance.

Question 10. In order to study the impact of air pollution on households, a random sample of 200 households was selected from each of the two communities. The respondent in each house was asked whether or not any one in the house was bothered by air pollution. The responses are tabulated below (Given $\chi^2_{0.05,1} = 3.841$ , $\alpha = 0.05$ ):
Community Yes No Total
I 43 157 200
II 81 119 200
Total 124 276 400
Can the researcher conclude that the 2 communities are bothered differently by air pollution?

Community	Yes	No	Total
I	43	157	200
II	81	119	200
Total	124	276	400

Solution:

Step 1: State the Hypotheses

Null Hypothesis ( $H_0$ ): There is no significant difference between the two communities in terms of being bothered by air pollution.
Alternative Hypothesis ( $H_a$ ): There is a significant difference between the two communities in terms of being bothered by air pollution.

Step 2: Calculate the Expected Frequencies

The expected frequency for each cell in the table is calculated using the formula:
$E_{ij} = \frac{(Row\ Total) \times (Column\ Total)}{Grand\ Total}$

For Community I (Yes):
$E_{11} = \frac{(Total\ for\ Community\ I) \times (Total\ Yes)}{Grand\ Total} = \frac{200 \times 124}{400} = 62$

For Community I (No):
$E_{12} = \frac{(Total\ for\ Community\ I) \times (Total\ No)}{Grand\ Total} = \frac{200 \times 276}{400} = 138$

For Community II (Yes):
$E_{21} = \frac{(Total\ for\ Community\ II) \times (Total\ Yes)}{Grand\ Total} = \frac{200 \times 124}{400} = 62$

For Community II (No):
$E_{22} = \frac{(Total\ for\ Community\ II) \times (Total\ No)}{Grand\ Total} = \frac{200 \times 276}{400} = 138$

Step 3: Calculate the Chi-Square Statistic

The chi-square statistic ( $\chi^2$ ) is calculated using the formula:
$\chi^2 = \sum \frac{(O_{ij} - E_{ij})^2}{E_{ij}}$

where $O_{ij}$ are the observed frequencies and $E_{ij}$ are the expected frequencies.

Let's compute it:

For Community I (Yes):
$\chi^2_{11} = \frac{(43 - 62)^2}{62} = \frac{(-19)^2}{62} = \frac{361}{62} \approx 5.82$

For Community I (No):
$\chi^2_{12} = \frac{(157 - 138)^2}{138} = \frac{19^2}{138} = \frac{361}{138} \approx 2.62$

For Community II (Yes):
$\chi^2_{21} = \frac{(81 - 62)^2}{62} = \frac{19^2}{62} = \frac{361}{62} \approx 5.82$

For Community II (No):
$\chi^2_{22} = \frac{(119 - 138)^2}{138} = \frac{(-19)^2}{138} = \frac{361}{138} \approx 2.62$

Summing these, we get:
$\chi^2 = 5.82 + 2.62 + 5.82 + 2.62 = 16.88$

Step 4: Compare the Chi-Square Statistic to the Critical Value

Given $\chi^2_{0.05,1} = 3.841$ , since $16.88 > 3.841$ , we reject the null hypothesis.

Conclusion

The researcher can conclude that the two communities are bothered differently by air pollution at the 0.05 significance level.

	Mean Life (in hours)	Standard Deviation(in hourse)
Type A	1300	50
Type B	1200	60

Free Solved[December 2023] BCS40 - Statistical Techniques Question Paper

Question 8. Find and plot the regression line of y on x, for the data given below: Speed(Km/hr)30405060Stopping Distance(in feet)160240330435

Question 8. Find and plot the regression line of y on x, for the data given below:
Speed(Km/hr) 30 40 50 60
Stopping Distance(in feet) 160 240 330 435